Project-Team:STARS

Inria | Raweb 2017 | Presentation of the Project-Team STARS | STARS Web Site


	PDF	e-Pub

Previous |

Home | Next next

Section: New Results

Detection of Pedestrians Using Deep Learning

Participants : Ujwal Ujwal, Frederic Precioso, Nagi Aly, François Brémond.

keywords: Deep learning, CNN

Introduction

The problem of pedestrian detection shares many important characteristics with the scenario of general object detection, its applications have much more practical and widespread ramifications. This includes areas such as surveillance, monitoring and autonomous vehicles. Traditional approaches such as HoG-based detection [59], [60] and Deformable Parts Model(DPM) [71], [102] based detection have been reasonably successful. However in the wake of recent interests in autonomous vehicles where the need of safety is utmost, it is pertinent to expect a much higher degree of performance from pedestrian detection systems.

The advent and popularity of deep learning beckons us to investigate it in search for such a high-performance system. Deep learning has been very successful in object detection problems of a more general taste as reflected by a large number of very successful systems. In our work, we focus upon investigating deep learning for designing high-performance pedestrian detection systems.

This work has been done in collaboration with Bertrand Leroy (VEDECOM)

State-of-the-art investigations

This year, we continued our investigations into the state-of-the-art deep learning based systems which have been proposed or have been applied to pedestrian detection. Deep learning has yet been without much theoretical foundations. The vastly practical and experimental playground of deep learning makes it very important to investigate existing systems [75], [92], [140] through thorough experiments in order to better comprehend their behavior in a vast variety of scenarios where pedestrian detection might be desired. Our investigations offered us insights such as the following :

Performance limitations of current systems: We were able to conclude a number of important scenarios where present state-of-art systems stutter in their detection performance. This primarily includes the following instances :
- Small-scale People: This refers to people who are far away from the camera; thus appearing small in size. We also refer to such instances as far-range people, who are often missed.
- Occluded People: People in urban environments are often occluded or semi-occluded by various entities such as lamp-posts and other vehicles to name a few.
- Seated People: Very often people who are either in a sitting position or riding a vehicle are often missed. This effect is much more pronounced in coupling with the previously mentioned case of small-scale people.
Suboptimal usage of CNN architectures: Convolution Neural Network (CNN) architectures are the backbone of deep learning based object detection systems. CNNs are hierarchical layers of neurons (e.g - as in multi-layer perceptrons (MLP)) albeit with more involved operations. We observe that the lower layers of a CNN are only implicitly utilized by extracting features from the last convolutional layer. We consider this to be suboptimal owing to our observations during our experiments that lower layers of a CNN indeed detect some important features which may prove useful with respect to scenarios such as small-scale people and occluded people.

Outcome

Our investigations have enabled us to focus upon some important aspects of our problem and have thus narrowed our focus. This allows us to focus upon relevant portions of system design. We expect this to induce more productivity in our future work.

Following these state-of-art studies we plan to coalesce our findings in a review paper which we aim to submit shortly to a journal.

Detection of small-scale people

As mentioned before, our state-of-the-art studies enabled us to identify that CNN architectures might being used in a suboptimal way. To take this investigation further, we worked upon the design of a better system which can make use of all the hierarchies of a CNN. We are correcting some implementation issues with the aforementioned system, although in our first experiments it did provide us with a miss-rate of $13.98 %$ as against the state-of-art miss-rate of $9 %$ . Miss-rate refers to the number of pedestrian instances which were not detected (thus false negative). Hence a lower miss-rate gestures at a better performing system. In our first experiments although we have a miss-rate roughly $5 %$ worse than the state-of-art, but we find it encouraging given that in our experiments we used a much smaller CNN. A smaller CNN gestures at a lower capacity for feature extraction. We believe that by employing a better-performing CNN, much better results may be warranted.

Outcome

Our work in this problem is currently moving ahead of our first experiments where we demonstrated the validity of our conclusions that suboptimal usage of CNN architectures might be a possibility in existing systems. We are currently focused upon our second set of experiments which involve employing a better CNN and conducting more exhaustive investigations into the system performance and behavior.

Previous |

Home | Next next